-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby #9263
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby #9263
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-21.10 #9263 +/- ##
================================================
- Coverage 10.85% 10.83% -0.03%
================================================
Files 115 116 +1
Lines 19158 18781 -377
================================================
- Hits 2080 2035 -45
+ Misses 17078 16746 -332
Continue to review full report at Codecov.
|
// The extra bounds checking is due to issue github.com/rapidsai/cudf/9156 and | ||
// github.com/NVIDIA/thrust/issues/1525 | ||
// where invalid random values may be passed here by thrust::reduce_by_key | ||
if (lhs < 0 || lhs >= d_col.size() || d_col.is_null(lhs)) { return rhs; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if rhs
here is also out of bound or null? Is returning out of bound or null desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the bounds checking is necessary seeing as this gets its values from a counting iterator which is always less than the size of d_col
.
For null, I do think it's ok to return the index of a null element. If both are null then either can be returned, the winning idx will later be removed when compared against an idx corresponding to valid value. And if the entire group contains nulls then it'll be nullified in the group mask generation step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bounds checking is necessary because the two issues mentioned in the comment above.The thrust::reduce_by_key
may actually pass an invalid (out-of-bounds) value because in certain cases it does not call through the iterator to retrieve the value it should pass. In these cases, it ignores the output but the damage is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. So bound check here is only to ensure the line 55 below works correctly, not to ensure the output to be used correctly later (since it is already handled).
// The extra bounds checking is due to issue github.com/rapidsai/cudf/9156 and | ||
// github.com/NVIDIA/thrust/issues/1525 | ||
// where invalid random values may be passed here by thrust::reduce_by_key | ||
if (lhs < 0 || lhs >= d_col.size() || d_col.is_null(lhs)) { return rhs; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the bounds checking is necessary seeing as this gets its values from a counting iterator which is always less than the size of d_col
.
For null, I do think it's ok to return the index of a null element. If both are null then either can be returned, the winning idx will later be removed when compared against an idx corresponding to valid value. And if the entire group contains nulls then it'll be nullified in the group mask generation step.
return thrust::get<1>(rhs); | ||
// The extra bounds checking is due to issue github.com/rapidsai/cudf/9156 and | ||
// github.com/NVIDIA/thrust/issues/1525 | ||
// where invalid random values may be passed here by thrust::reduce_by_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// where invalid random values
Is there ever valid random values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could happen. The data could be randomly valid. Either way, the thrust::reduce_by_key
ignores the result -- it is just trying to fill a block/warp with minimal divergence.
@gpucibot merge |
Closes #9156
This PR simplifies the parameters when calling thrust::reduce_by_key for the argmin/argmax aggregations in cudf::groupby. The illegalMemoryAccess found in #9156 was due to invalid data being passed from thrust::reduce_by_key through to the BinaryPredicate function as documented in NVIDIA/cccl#789
The invalid data being passed is only a real issue for strings columns where the device pointer was neither nullptr nor a valid address. The new logic provides only size_type values to thrust::reduce_by_key so invalid values can only be out-of-bounds for the input column which is easily checked before retrieving the string_view objects within the ArgMin and ArgMax operators.
This the same as #9244 but based on 21.10